Co-training applied in automatic term extraction
نویسنده
چکیده
This paper discusses the use of a setting similar to co-training in automatic terminology processing. Two aspects of terms (internal aspect, i.e linguistic, and statistical properties; and external aspect, i.e. contexts) will be used interchangeably in a bootstrapping manner, in order to extract more and more terms and context patterns. The results show that, using only a small set of seed terms, the method can extract terms, with higher success rates than those of other methods. Further more, this method can also discover interesting context patterns, which can be used in other terminology processing applications.
منابع مشابه
Using machine learning to perform automatic term recognition
In this paper a machine learning approach is applied to Automatic Term Recognition (ATR). Similar approaches have been successfully used in Automatic Keyword Extraction (AKE). Using a dataset consisting of Swedish patent texts and validated terms belonging to these texts, unigrams and bigrams are extracted and annotated with linguistic and statistical feature values. Experiments using a varying...
متن کاملCombining Optimal and Atomic Decomposition of Terminology Association graphs
We introduce novel approaches of graph decomposition based on optimal separators and atoms generated by minimal clique separators. The decomposition process is applied to co-word graphs extracted from Web Of Science database. Two types of graphs are considered: co-keyword graphs based on the human indexation of abstracts and terminology graphs based on semi-automatic term extraction from abstra...
متن کاملTerm Extraction and Mining of Term Relations from Unrestricted Texts in the Financial Domain
In this paper, we present an unsupervised hybrid textmining approach to automatic acquisition of domain relevant terms and their relations. We deploy the TFIDFbased term classification method to acquire domain relevant terms. Further, we apply two strategies in order to learn lexico-syntatic patterns which indicate paradigmatic and domain relevant syntagmatic relations between the extracted ter...
متن کاملAutomatic segmentation of glioma tumors from BraTS 2018 challenge dataset using a 2D U-Net network
Background: Glioma is the most common primary brain tumor, and early detection of tumors is important in the treatment planning for the patient. The precise segmentation of the tumor and intratumoral areas on the MRI by a radiologist is the first step in the diagnosis, which, in addition to the consuming time, can also receive different diagnoses from different physicians. The aim of this study...
متن کاملCombining statistics on n-grams for automatic term recognition
This paper presents the work-in-progress in the development of an automatic term recognition (ATR) system built around the Corpus Cientı́fico-Técnico (CCT). Terms are modeled using three non-correlated dimensions: unithood, domainhood and usage, applied to a set of -grams automatically extracted from the corpus. These dimensions are combined with a supervised machine learning algorithm in order ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003